Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link resolution #458

Merged
merged 16 commits into from
Nov 14, 2016
Merged

Link resolution #458

merged 16 commits into from
Nov 14, 2016

Conversation

mpacer
Copy link
Member

@mpacer mpacer commented Nov 5, 2016

Close #11 by passing an intermediate JSON representation through a filter that directly manipulates the underlying pandoc AST as part of a 2 step conversion process.

It also creates a general utility for walking through this representation in other cases based on the https://github.com/jgm/pandocfilters library with a new utility (applyJSONFilters) that we are mirroring until/if jgm/pandocfilters#49 is merged.

@takluyver

if key == 'Link':
target = val[2][0]
# Links to other notebooks
m = re.match(r'(\d+\-.+)\.ipynb$', target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross-notebook linking only works as part of bookbook, so we should probably take it out of here to avoid confusion.

return RawInline('tex', 'Section \\ref{sec:%s}' % m.group(1))

# Links to sections of this or other notebooks
m = re.match(r'(\d+\-.+\.ipynb)?#(.+)$', target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, the first part of this only makes sense for bookbook joining multiple notebooks together.

# mydir = os.path.dirname(os.path.abspath(__file__))
# filter_links = os.path.join(mydir, 'filter_links.py')
# if extra_args is not None:
# extra_args.extend(['--filter',filter_links])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably didn't mean to commit this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes meant to delete that is left over from the efficiency prototyping

@@ -64,7 +64,7 @@

% Render markdown
((* block markdowncell scoped *))
((( cell.source | citation2latex | strip_files_prefix | convert_pandoc('markdown', 'latex') )))
((( cell.source | citation2latex | strip_files_prefix | convert_pandoc('markdown', 'json',extra_args=[]) | wrapped_convert_link | convert_pandoc('json','latex'))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open API question - do we want to have a function which encapsulates these last three steps into one:

  • pandoc to json
  • apply Python filters
  • pandoc from json

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about doing that, but I figured that should be a different PR that demonstrates the generality of this, whereas this is more of a bugfix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new issue #462 to address this

@@ -109,6 +109,28 @@ def check_pandoc_version():
RuntimeWarning, stacklevel=2)
return ok

try:
from pandocfilters import applyJSONFilters
except ImportError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John did a new release of pandocfilters for us, so we can rely on >=1.4.1 and drop the falllback here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -50,6 +50,7 @@
'prevent_list_blocks': filters.prevent_list_blocks,
'get_metadata': filters.get_metadata,
'convert_pandoc': filters.convert_pandoc,
'wrapped_convert_link': wrapped_convert_link,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the name here to be a bit more descriptive: wrapped is an implementation detail, and it's not clear what it means (wrapped in what?). A descriptive name might be like pandoc_internal_links_to_latex_refs. That is a bit wordy, so better suggestions welcome.

I also wonder if it should only be registered for the latex exporter - then the name could be a bit less precise. Adding a filter for an exporter works like this (example from bookbook):

def default_filters(self):
    yield from super(MyHTMLExporter, self).default_filters()
    yield ('markdown2html', markdown2html_custom)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed

Copy link
Member Author

@mpacer mpacer Nov 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I don't think that this idiom works exactly as is (also I think this is not quite how bookbook does it, https://github.com/takluyver/bookbook/blob/8d7014d8e9775789ce62d092921aa56898da66c0/bookbook/html.py#L29) I'll work on it.

@@ -109,6 +109,22 @@ def check_pandoc_version():
RuntimeWarning, stacklevel=2)
return ok

def applyJSONFilters(actions, source, format=""):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this function now that it's in pandoc.


def resolve_one_reference(key, val, fmt, meta):
"""
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's either put a docstring in or remove the empty docstring (I'm happy for this to have no docstring).

@@ -43,6 +43,10 @@ def _template_skeleton_path_default(self):
template_extension = Unicode(".tplx").tag(config=True)

output_mimetype = 'text/latex'

def default_filters(self):
yield from super(TemplateExporter,self).default_filters()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be passing in the current class here, i.e. LatexExporter, not TemplateExporter.

Python 3 improved on this, so you can just call super().default_filters() as we do in bookbook. But nbconvert still has to support Python 2 for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason even when running in my Python 3 environment, that doesn't work. with the current code.

However, the Python 2 compatible way of doing of it does work. Any idea why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure off the top of my head. If you're interested in working it out, we can try to debug on the Hacktrain tomorrow.

@mpacer
Copy link
Member Author

mpacer commented Nov 12, 2016

I do not like that this causes a ridiculous slowdown in the tests. I imagine this will negatively impact anyone who is converting a large number of notebooks.

I'm ok merging this but we should make it a priority then figuring out if we can do some kind of conditional filtering (something like adding a piece of metadata to a cell if it has a link in it (or even a specifically internal link in it) as part of a postprocessing step when saving the notebook. I'm not sure but my impression is that this may be appropriate in the https://github.com/jupyter/nbformat repo? Or if that does no analysis of the content of the notebook file, something as part of the saving mechanism inside the notebook itself.

@@ -44,6 +44,10 @@ def _template_skeleton_path_default(self):

output_mimetype = 'text/latex'

def default_filters(self):
yield from super(LatexExporter, self).default_filters()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot when I gave the example that yield from is Python 3 only. To make this work on Python 2, we'll need to expand it:

for f in super(LatexExporter, self).default_filters():
    yield f

@takluyver
Copy link
Member

Well, we knew it was going to be slower, right? We can look at ways to mitigate that, like maybe offering an option to skip the link processing entirely.

I don't like the idea of doing stuff on save to support this:

  • It's extra complexity and extra slowness on every save for something which is only relevant for converting notebooks to latex. Most notebooks will never be converted to latex.
  • It requires a change in every tool writing notebooks (e.g. nteract, Pycharm), and they would have to inspect Markdown, which is currently unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants